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Background 

Colon cancers microsatellite instability status is a better marker for response to adjuvant 
chemotherapy with fluorouracil than tumour stage II and III. The majority of hereditary colorectal 
cancer cases are microsatellite instable. We investigated the possibility of classifying colon tumors 
based on gene expression in crude biopsies and correlated these to crude survival and investigated if 
the gene expression profile can also identify hereditary cases from sporadic cases. 

Methods 

Gene transcripts from tumour specimens were quantified using microarray technology. The tumors 
were clustered using unsupervised and supervised classification algorithms. Sets of genes were 
defined for classification of microsatellite instability status and sporadic verses hereditary 
microsatellite instable tumors. Real-time PCR was used to validate microarray data and to 
investigate platform dependency in a new independent set of 47 colorectal tumors. 



Results 

Unsupervised hierarchical clustering revealed that tumors were essentially separated according to 
microsatellite instability status. Supervised classification of the 97 tumor samples using a maximum 
likelihood classifier with a crossvalidation loop resulted in tree misclassification as compared to 
microsatellite analysis using from 106 genes and down to only seven genes. The stability of 
classification of colon tumors in relation to microsatellite status was tested by permutation analysis. 
The sensitivity for diagnosis of microsatellite stable tumors exceeded 99% with a specificity 
exceeding 96%. The positive and negative predictive values exceeded 95% and 98%. respectively. 
The classifier was demonstrated not to be platform dependent as it could successfully be reproduced 
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by real-time PGR. This was further verified as the classifier also correctly classified 95.7% of a new 
independent set of 47 colorectal tumors using real-time PGR. 

Based on microarray data we identified ten genes that were highly correlated with hereditary 
disease. Using down to two of these genes 36 of 37 microsateliite instable tumors could be correctly 
separated into sporadic and hereditary MSI-H colorectal tumors. 

Crude survival according to microsateliite status as determined by the classifier, revealed that stage 
II colon receiving no adjuvant chemotherapy, that patient displaying microsateliite instability had 
significantly longer overall survival than patient exhibiting microsateliite stable tumors <P~0.00 14). 
By contrast, the patient with Dukes 1 C tumors displaying microsateliite instability did not have a 
significant increase in overall survival as compared to patient exhibiting microsateliite stable tumors 
(P-0.55). 

Conclusion 

Colon cancer can be stratified into two molecular distinct groups by quantification of the transcripts 
of 106 genes or even down to seven genes. The two groups are highly correlated with microsateliite 
stable (MSS) and microsateliite instable (MSI) tumors. The 7-gene classifier clearly proved to be a 
strong predictor of survival in Dukes B and it can be used to select patients who need adjuvant 
chemotherapy, namely those classified as MSS. We demonstrate that this classification is also valid 
when performed by real-time PGR analysis allowing a fast diagnosis in a clinical setting. Finally, 
sporadic from hereditary cases in tumors exhibiting microsateliite instability can be identified based 
on gene expression monitoring. 
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Colon is the fourth most frequently diagnosed malignancy and the second most common cause of 
cancer death in the western world. The standard treatment of colon cancer is advised according to 
tumor stage. Patient with Dukes' C colon cancer receives a flurouracil-based adjuvant systemic 
chemotherapy in addition to surgical resection of the tumor, whereas the treatment for Dukes' B 
patients is based alone on surgical resection. 

There is accumulating evidence that these cancers belong to two distinct molecular types according 
to genetic alterations. The mutator phenolype featuring tumors with microsatellite instability (MSI) 
and the suppressor pathway displying chromosomal instability and microsatellite stable (MSS). 
MSI has been defined as a change of any length due to either insertions or deletions of repeating 
units in a microsatellite within a tumor compared to normal tissue and is caused by an underlying 
defect in the mismatch repair (MMR) system. (Boland et al, CR 1998, 58:5248). The MSI pathway 
may either be sporadic or hereditary (HNPCC) and whereas the disruption of the MMR system in 
sporadic MSI tumors is most often caused by somatic methylation of the MLHI gene promoter 
more that 90% of HNPCC cancers are caused by germ line mutations in MLHI or MSH2. 
The MSS pathway to cancer begins with the inactivation of tumor suppressor genes, such as 
APC/B-catenin genes, followed by activation of oncogenes and inactivation of additional tumor 
suppressor genes, commonly with a high frequency of allelic losses and cytogenetic abnormalities 
and abnormal DNA tumor content. Many studies have defined the pathoclinical trait of MSI and 
MSS tumors and found that MSI positive cancers are most frequently found in the right side of the 
colon, they tend to be of less differentiated, they tend to be larger in size, are often mucinous and 
often exhibit extensive infiltration by lymphocytes. 

Crude survival data suggest that patients with HNPCC have a belter prognosis than those with 
sporadic disease [48,49.50] and studies have also shown that MSI is an independent indicator of 
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broad spectrum of tumors in relation to location, heredity, microsatellite instability status, and 
origin of the patient. All tumors were collected in the period from 1994 to 2002. 68 tumor samples 
were collected at nine different clinics in Finland and 33 samples were collected at four different 
clinics in Denmark, 36 were Dukes' B, 67 Dukes' C, 41 were sporadic microsatellite highly instable 
(MSI-H) of which were 17 HNPCC, and 59 were sporadic microsatellite stable (MSS) (table 1). 
None of the patients received pre-operative radiation or chemotherapy. 

Microsatellite-utstability analysis. From all tumor samples available as paraffin blocks, ten 
sections were cut at 1 Oum and stained with haematoxylin. The first and last section was cut at 4 urn 
and stained with haematoxylin. These two sections were used for the identification of tumor and 
normal cells from each sample. Regions enriched in tumor cells (more than 90%) were 
microdissected from these sections and DNA was extracted using a Puregene DNA extraction kit 
(Gentra Systems, Minneapolis, MN). DNA from blood samples was used as control when available, 
otherwise normal tissue was microdissected from the tissue sections. The samples were analysed for 
microsatellite instability according to the NCI guidelines (Boland et al). Samples positive for 
markers BAT25 and BAT26 were scored as MSI-H. Samples positive for only one of these markers 
were tested for further markers and scored as MSI-L if none of these tested positive. Since MSl-L 
has similar clinical features as MSS these samples were considered as MSS in this study. In 
addition to microsatellite analysis all tumors from which paraffin blocks were available were tested 
for the presence of MLH1 and MSH2 protein by immunohistochemistry. None of the samples 
scored MSS were negative for either protein whereas six of the MSI scored samples were positive 
for both (Table 1). 
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RNA purification Colon specimens were obtained fresh from surgery and were immediately snap 
frozen in Liquid nitrogen either as was, in OCD-compoimd or in a SDS/guadinium thiocyanate 
solution. Total RNA was isolated using RNAzol (WAK-Chemie Medical) or spin column 
technology (Sigma) following the manufactures" instructions. 

Gene expression analysis These procedures were performed at described in detail elsewhere 
(Dyrskadt et ai). Briefly, ten jig of total RNA was used as starting material for the targel preparation 
as described. First and second strand cDNA synthesis was performed using the Superscript II 
System {Invitrogen) according to the manufacturers' instructions except using an oligo-dT primer 
containing a T7 RNA polymerase promoter site. Labelled aRNA was prepared using the BioArray 
High Yield RNA Transcript Labelling Kit (Enzo) using Siotin labelled CTP and UTP (Enzo) in the 
reaction together with unlabeled NTP's. Unincorporated nucleotides were removed using RNeasy 
columns (Qiagen). Fifteen \ig of cRNA was fragmented, loading onto the Affymetrix HGJJ133A 
probe array cartridge and hybridized for 16h. The arrays were washed and stained in the Affymetrix 
Fiuidics Station and scanned using a confocal laser-scanning microscope (Hewlett Packard 
GeneArray Scanner G2500A). The readings from the quantitative scanning were analyzed by the 
Affymetrix Gene Expression Analysis Software (MAS 5.0) and normalized using RMA (robust 
multi array normalisation, Irizarry et ai. 2002) in the statistical application R. Redundant probesets 
(as defined form Unigene build 168) with high correlation (>0.5) over all samples were removed, 
which reduced the dataset to approximately 14.400 probesets. This dataset was used a source for all 
further calculations in this manuscript. 
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Unsupervised agglomerative hierarchical clustering 

For hierarchical cluster analysis 1239 genes with a variation across all samples greater than 0.5 
were median-centred to a magnitude of 1. Samples and genes were then clustered using average 
linkage clustering with a modified Person correlation as similarity metric (Eisen et aL> PNAS 95: 
14863-14868, 1998). The cluster dendrogram was visualized withTreeView (Eisen). 



Group testing 

We make a statistical test where the p-value is evaluated through permutations. For each group and 
gene we calculate the average and the sum of squared deviations from the average. We then sum 
these over the genes and the groups: 



S,= 2 S (Xjj - X gr(j)j ) 2 



group* qscwi 



This expression is calculated for joining DK with SF and MSI with MSS such that we end up with 
two groups. The sum of squared deviations is denoted S2. As a test statistic we use S1/S2. A small 
value indicates that there is a real reduction in the deviations when going from 2 to 4 groups and 
thus the groups have a real significance. To judge if a value is significantly small we use 
permutations. For each of the four groups left when joining DK and SF we randomly allocate the 
members to a pseudo DK and pscudo SF in such a way that the number of members in each group 
are as in the original data 

To get an understanding of this separation we performed a test to see if this is caused by few genes 
or if many genes ate involved. For this test we calculated St = Egw*iSi(gene) and similarly with S2 

=Z Bene!i S2(gene). For each gene j we used ihe test statistic Si(j)/S2<j) (Table 3). 
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Multidimentitmal Scaling 

We carried out multidimentional scaling on median-centered and normalized data using 
scale in the statistical application R and visualized in a two-dimentional plot. 



Microsatellite status classifier 

The readings from the quantitative scanning were analyzed by the Affymeirix Gene Expression 
Analysis Software (MAS 5.0) and normalized using RMA (robust multi array normalisation. 
Irizarry et al. 2002) in the statistical application R. Redundant probesets (as defined form Unigene 
build 168) with high correlation (>0.5) over all samples were removed, which reduced the dataset to 
approximately 14.400 probesets. 

The microsatellite instability status classifier was based on a dataset of 4.266 genes. These genes 
result from the removal of genes with a variance over all tumor samples smaller than 0.2 and genes 
that separate Danish from Finnish samples with a t-value numerically greater than 2. We used a 
normal distribution with the mean dependent on the gene and the group (MSI, MSS). For each gene, 
we calculated the variation between the groups and the variation within the groups to select genes 
with a high ratio between these. To classify a sample, we calculated the sum over die genes of the 
squared distance from the sample value to the group mean, standardized by the variance and 
assigned the sample to the nearest group. The sample to be classified was excluded when 
calculating group means and variances. 
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Estimation of classifier stability 

We validated the performance of the classifier by permutation. One hundred datasets consisting of 
30 MSS samples and 25 MSI samples were randomly chosen by permutation for training of the 
classifier with the remaining samples in each case being assign to a testset. Averages over the 1 00 
data sets of the number of errors in the cross-validation of the training set and in the test set were 
used as a measure o f the precision of the classifier. 

Real-time PCR (RT-PCR). The procedures were as described (Birkenkamp-Demtroder) except 
that we used short LNA (Locked Nucleic Acid) enhanced probes from a Human Probe Library 
(Exiqon™). Tn short, cDNA was synthesized from single samples some of which were previously 
analyzed on GeneChips. Reverse transcription was performed using Superscript II RT (Invitrogen). 
Real-time PCR analysis was performed on selected genes using the primers (DNA Technology) and 
probes (Exiqon, DK) described in figure legend X. All samples were normalized to GAPDH as 
described previously (Birkenkamp-Demtroder et. al. Cancer Res.. 62: 4352-4363, 2002). 

Rebuilding of Classifier based on Real-Time PCR 

The 79 tumors samples that were not analysed by real-time PCR were transformed into log ratios 
using one of the tumor samples as reference and used for training of the classifier. Then 23 samples 
of which 18 were also analyzed on arrays were equally transformed into log ratios using the same 
tumor sample as above as reference and tested. The idea behind this translation is that we expect the 
normalized PCR values to be proportional to the normalized array values, and on a log scale this 
becomes an additive difference. The difference is gene specific and is therefore estimated for each 
gene separately. The variation obtained from the microarray data, and used in the classifier, can be 
used directly on the PCR platform. 
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Results 

Hierarchical Clustering 

The clinical specimens used in this study were collected in two different countries from 14 different 
clinics in the period 1994 to 2001. The samples were selected to keep a balanced representation of 
microsatellite instable (MSI) and microsatellite stable (MSS) tumors from both the right- and left- 
sided colon. The MSI class was represented both by sporadic MSI and hereditary MSI (HNPCC) 
tumors. Only Dukes' B and Dukes* C mmor samples were included were selected (table J). Before 
any attempt to divide a diverse sample collection into distinct classes analyzed the data for 
systematic bias that may have been introduces during the experimental procedures. A fast and easy 
way to discover both true distinct classes as well as systematic biases in the data is to perform a 
hierarchical clustering. 

The phylogenetic tree resulting from hierarchical clustering on 1239 genes (fig 1) reveals that the 
main separating factor is microsatellite status. On the upper trunk we find two clusters represented 
mainly by normal biopsies (14/21) and MSS tumors (18/25), respectively. The lower trunk is 
divided into a MSI cluster (30/36) and a second MSS cluster (MSS2-cluster) (34/37). A closer 
inspection of the two MSS clusters unveil that one is dominated by Danish samples (1 9/25) and one 
by Finnish samples (26/37 check). Also, it is worth to notice that the MSI cluster contains a vast 
majority of Finnish samples (32/36) and that the sporadic MSI samples are interspersed among the 
hereditary samples. The normal biopsies cluster tight together with a slight tendency to separation 
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according lo origin. Tree normal samples cluster within the MSI cluster indicating that resection of 
these samples may have been to close to the tumor lesion. 

Inspection of the gene cluster dendrogram shows that the two groups of MSS tumors are mainly 
separated by a large cluster of genes being upregulated in the Danish samples (data not shown) 
indicating thar a systematic difference between Danish and Finnish samples. 



Significance of Observed Groups 

Based on these observations, we performed a series of test to evaluate irthe observed separation of 
tumors into MSS and MSI as well as DK and SF are significant. For these tests the tumor samples 
were grouped into four virtual tumor-groups labelled, i.e. Danish MSI (MSl-DK), Danish MSS 
<MSS-DK), Finnish MSI (MSI-SF) and Finnish MSS (MSS-SF). Based on S082 genes with a 
variance above 0.2, we tested if all four groups are significant or if some of the groups can be 
joined. We considered the two possibilities of joining DK and SF. and of joining MSI and MSS and 
made a statistical test where the p-value is evaluated through permutations. In 100 permutations of 
each group combination our test value S1/S2 is considerably smaller than in all permutation (Table 
2) demonstrating a very clear separation between DK and SF and between MSI and MSS. Such a 
clear distinction between groups may rely on a few highly separating genes or a general difference 
in the gene expression profile including many genes. For both the DK-SF and MS1-MSS the effect 
are caused by many genes even at very criteria, i.e. low test statistic Si(j)/S2(j) values (Table 3). 
When a property is present that influences a large proportion of the genes this may obscure 
separation of clinical relevant features in unsupervised clustering. To visualize the effect of such 
properties, we calculated distances by multidimensional scaling between samples with and without 
of 816 genes separating DK from SF with a t-value numerically greater than 2 (Fig 2). We see an 
improved separation of MSI and MSS with Danish and Finnish cases mixed. The MSl-DK samples 
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not completely separated as they are found both between the MSI-SF and the MSS samples. 
(These plots are not entirely unsupervised since the groups have been used to remove gene). 



Construction of an MSI-MSS classifier 

For the construction of a classifier we used the expression profiles from 97 tumors for which no 
ambiguity had been identified in relation to microsateilite status. The 816 genes separating DK from 
SF were excluded, as these would be unreliable for MS classification. We built a maximum 
likelihood classifier in order to select a minimum of genes giving the largest possible separation of 
the two groups. We tested the performance of the classifier using 1-1000 genes and found that it 
was stable showing 3-6 errors when using 4 - 400 genes. Of these 106 genes were especially suited 
for discrimination of MSS from MSI (table 4). The minimum of three errors was found even using 
only 7 genes (Table 5). 

Classification of ambiguous samples 

Application of the 7-gene classifier to the four samples showing ambiguity in the microsateilite 
analyses assigns all four to be microsateilite stable tumor class. Notably, all four showed expression 
levels of Tumor Growth Factor P induced protein (TFCBI). MLH1 and thymidylate synthase 
(TYMS) that are atypical for MSI tumors. Furthermore, these tumors were all from the left colon. 
Thus the misclassified tumors are clearly truly MSS or they belong to a yet undefined class of MSI 
tumors. 
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Classification of Colon Cancer 
Stability of classification 

To estimate the stability of the classifier based on all 97 tumor samples, we generated one hundred 
new classifiers based on randomly chosen datasets consisting of 30 MSS and 25 MSI samples. In 
each case the classifiers were tested with the remaining samples. The performance for each set was 
evaluated and averaged over all 100 training and test sets (Table 6). The mean error rate for MSS 
tumors was 0.52% and 1.38% for MSI tumors. The seven genes defined above were Pound to be 
those genes that were most frequently used in the crossvalidation loop. More than 50% of the errors 
were related to three tumors of which two were wrongly classified in all permutation and one in 
94%. The remaining errors were mainly caused by four tumors with error rates of 40-47% showing 
that the former three samples are truly assigned contradictory to result from the microsatellite 
analysis and that four samples could not be assigned with confidence loo any of the classes. 

Survival classifier 

Using the same classification methods described above, we build classifiers for survival based on 
either all samples or the above defined groups of MS1-H and MSS. As seen in figure 3. a distinction 
of patient with good prognosis (>5 year survival) from patient with bad prognosis (<5 years 
survival) can be achieved with higher precision and using only a fraction of the genes by first 
separating into MS1-H and MSS groups. 

Construction of a classifier for sporadic versus hereditary microsatellite instable tumors 
In order to identify a gene set for identification of hereditary microsatellite instable tumors we 
applied 19 sporadic microsatellite instable samples and 18 microsatellite instable samples to 
supervised classification as described above. We found ten genes we high scored for separation of 
sporadic MSI-H from hereditary MSI-H tumours (Table 8). In crossvalidation we found a minimum 
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number of one error using two genes (Fig 4A) and were used in at least 36 of the 37 crossvalidation 
loops. The genes were: the mismatch repair gene MLH1 that show a general downregulation in 
sporadic disease and PIWILl that is lower expressed in hereditary cases (Fig 4B). Using these two 
genes only one error occurred: a sporadic microsatellite instable was classified as hereditary. Based 
on T-test we performed 500 permutations to test the significance of these two genes for marker 
genes and found both genes highly significant with p-values < 0.005. 

Cross platform classification 

Real time PCR was applied both to verify the amy data and examine if the 7-gene classifier would 
also perform on this platform. We chose 23 samples of which 18 were also analyzed on arrays. The 
correlation between the two platforms was high (data not shown). In order to test the performance 
of classification using PCR data we re-build our classifier with a 79 samples array dataset including 
only those tumors that were not analyzed with PCR. Two samples were classified in discordance 
with the microsatellite instability test of which one of them was ambiguously classified by the 7- 
gene array classifier. 



Relation between microsatellitc-instability status, stage and survival 

Based on the 7-gene classifier, classification of 36 patients with Dukes' B tumors receiving no 
adjuvant chemotherapy, 18 were classified as MSI tumors and 18 as MSS tumors. The overall 
survival was highly significantly related to the classification since all nine patients that died within 
five years of follow-up were belonged to the MSS group (P-0.0014) (Fig. 5A). Thus, the 7-gene 
classifier clearly proved to be a strong predictor of survival in Dukes B and it can be used to select 
patients who need adjuvant chemotherapy, namely those classified as MSS. 
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Among 65 patients with Dukes' C tumors receiving adjuvant chemotherapy, 17 were classified as 
MSI tumors and as 48 MSS tumors. Of these, 6 MSI and 27 MSS patients died within five years of 
follow-up meaning no significant difference in overall survival between these groups (P-0.S5) 
(Fig.5B). A trend was that the MSI showed a poorer short-term survival than the MSS, contrary to 
Dukes B patients. This difference can be attributed to the fact that a recent large study has shown 
that Chemotherapy only benefit the MSS tumor patients, thus improving their survival to a level 
comparable to that which is characteristic of MSI tumor patients. 

Clinical application of the discovery 

In the clinic the 106 or less genes described can be used for predicting outcome of colorectal cancer 
when examined at the RNA level and also on the protein level as each gene identified is the project 
is transcribed to RNA that is further translated into protein. The genes can also be used determine 
which patient should be treated with chemotherapy as only non-microsatellite instable tumors will 
respond to 5-FU based therapy. Building classifiers can achieve a further stratification of patient 
with god and bad prognosis after stratification into microsatellite instable and stable tumors. The 
genes used to identify hereditary disease can be used to decide which patient should enter into 
sequencing analysis of mismatch repair genes. 

The RNA determination can be made in any fonn using any method that will quantify RNA. The 
protems can be measured with any method quantification method that can determine die level of 
proteins. 
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Figure Title, 

Figure 1. Phyiogenetic tree resulting from unsupervised hierarchical clustering. 
Figure 2. Multidimentional scaling plot. 

Figure 3. Performance of prediction of survival before and after separation in MSi-H an< 
tumors. 

Figure 4. Performance of the classifier for identification of hereditary disease. 
Figure 5. Kaplan Meier estimates of overall survival. 

Table 1 . Summary of cltnicopathological and microsatellite features of colon cancer samples 
Table 2. Permutation test of groups 
Tabic 3. Permutation test of genes 
Table 4, Performance of the classifier 

Table 5. Genes used for the classification of MSS vs MSI tumors 
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Figure 1. Clusteranalysis of Colon Specimens with Associated Clinicopathological Features 
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Figure 2. Multidimentional Analysis showing distances between groups of tumors. 
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Figure 4 
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Figure* Kaplan-Meier Estimates of Overall Survival among Patients with Dukes 1 B and 
Dukes' C Colon Cancer According to the Microsatellite-Instabiiity Status of the Tumor, 
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Table 2. Permutation test of groups 



Pseudo 
group 


S1/S2 from data 


Smaller values in 
100 permutations 


Minimum in 100 
permutations 


DK-SF 
I-S 


0.9072795 
0.9166195 


0 
0 


0.962269 
0.9583325 
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Table 3. Permutation test of genes 









s,nvs 2 (i) 




Pseudo group 




<0.6 


< 0.7 < 0.8 


<0.9 


DK-SF 


number of genes 

max in 100 permutations 


36 
0 


136 522 
0 2 


1785 
225 


MSI-MSS 


number of genes 

max in 100 permutations 


17 
0 


103 399 
1 8 


1507 
250 



! 
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Table S. Genes used for the classification of'MSS vs MSI tumors 

km 



Name 



IE 



hepatocellular carcinoma-associated antigen 
metastasis-associated 1-Iike 1 
chemokine (C-X-C motif) iigand 10 
heterogeneous nuclear ribonucleoprotein L 
hypothetical protein FLJ20618 
splicing factor, arglnine/serine-rich 6 
protein kinase C binding protein 1 



Symbol 


Untaene 


MSS 


MS) 


•HCA112 


Hs.12126 


1261 


653 


MTA1L1 


Hs. 173043 


45 


91 


"CXCL10 


Hs.2248 


104 


274 


HNRPL 


Hs.2730 


194 


630 


FLJ20618 


Hs.52184 


776 


388 


>SFRS6 


Hs.6891 


74 


446 


PRKCBP1 


Hs.75871 


294 


168 



r.. 
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Table 6. Performance of the classifier If : 

Trainings set W I Test set 

Errors in crossvalidation |?r : '• Test errors 

MSI 2.8% (n=25, range 0-6) b , 1 .4% (n=40, range 0-4) 
MSS 0.70% (n=30, range 0-3) t '■ 0.52% (n=29, range 0-2) 
All 1.7% (n=S5, range 1-7) j : 1.9% (n=39, range 0-5) 
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Table 7. 




Positive for MSS 
Negative for MSS 


True = (0:9948*29)= 
False -(0.0052*29)= 


■28,8492 
-0.1508 


False —(0.1 38* 1 0)~ 1 .3 8 
True- (0.962*1 0V= 9.62 


Sensitivity 
Specificity 

Positive predictive value 
Negative predictive value 


28.9507/29 
9.62/10 || 
28.8492/30.2292 = 
9.62/9.7708 


99,5% 
96.2% 
95.4% 
98.5% 





•Based on a prevalence for MSS of 85% 
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Table- 8 




Homeo box C4_ 



Ptwi (DrosophilaHike 1 



MutL (E. coti) homolog 1 (colon cancer, 
nonpolyposis type 2) 



Collapsin response mediator protein 1 



AFFYDESCRIPTION 



Homeo box B2 (HOXB2) 

Pyrroline-5-carboxylate synthetase (glutamate 

qamma-semialdehyde synthetase) (PVCS) 

TGFB inducible early growth respo nse (TIEG) 



Checkpoint with forkhead and ring finger 
domains (CHFR) 



Hy pothetical protein FLJ1384 2 (FU13S42) 



Phosphoprotein regulated by mitogenic 
pathways (C8FW) — 



